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OPTIMAL SCALING FOR PARTIALLY UPDATING 
MCMC ALGORITHMS 

By Peter Neal and Gareth Roberts 
University of Manchester and Lancaster University 

In this paper we shall consider optimal scaling problems for high- 
dimensional Metropolis-Hastings algorithms where updates can be 
chosen to be lower dimensional than the target density itself. We find 
that the optimal scaling rule for the Metropolis algorithm, which 
tunes the overall algorithm acceptance rate to be 0.234, holds for the 
so-called Metropolis-within-Gibbs algorithm as well. Furthermore, 
the optimal efficiency obtainable is independent of the dimension- 
ality of the update rule. This has important implications for the 
MCMC practitioner since high-dimensional updates are generally 
computationally more demanding, so that lower-dimensional updates 
are therefore to be preferred. Similar results with rather different con- 
clusions are given for so-called Langevin updates. In this case, it is 
found that high-dimensional updates are frequently most efficient, 
even taking into account computing costs. 

1. Introduction. There exist large classes of Markov chain Monte Carlo 
(MCMC) algorithms for exploring high-dimensional (target) distributions. 
All methods construct Markov chains with invariant distribution given by 
the target distribution of interest. However, for the purposes of maximizing 
the efficiency of the algorithm for Monte Carlo use, it is imperative to design 
algorithms which give rise to Markov chains which mix sufficiently rapidly. 
Since all Metropolis-Hastings algorithms require the specification of a pro- 
posal distribution, these implementational questions can all be phrased in 
terms of proposal choice. This paper is about two of these choices: the scaling 
and dimensionality of the proposal. We shall work throughout with contin- 
uous distributions, although it is envisaged that more general distributions 
might be amenable to similar study. 
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One important decision the MCMC user has to make in a <i-dimensional 
problem concerns the dimensionality of the proposed jump. For instance, two 
extreme types of algorithm are the following: propose a fully <i-dimensional 
update of the current state (according to a density with a density with 
respect to d-dimensional Lebesgue measure) and accept or reject accord- 
ing to the Metropolis-Hastings acceptance probabilities; or, for each of the 
d components in turn, update that component conditional on all the oth- 
ers according to some Markov chain which preserves the appropriate con- 
ditional distribution. The most widely used example is the d-dimensional 
Metropolis algorithm, in one extreme, and the Gibbs sampler or some kind 
of "Metropolis-within-Gibbs" scheme in the other. In between these two 
options, there lie many intermediate strategies. An important question is 
whether any general statements can be made about algorithm choice in this 
context, leading to practical advice for MCMC practitioners. 

In this paper we concentrate on two types of algorithm: Metropolis and 
Metropolis adjusted Langevin algorithms (MALA). We consider strategies 
which update a fixed proportion, c, of components at each iteration, and con- 
sider the efficiency of the algorithms constructed asymptotically as d— > oo. 
In order to do this, we shall extend the methodology developed in [6, 7] 
to our context. The analysis produces clear cut results which suggest that, 
while full-dimensional Langevin updates are worthwhile, full-dimensional 
Metropolis ones are asymptotically no better than smaller dimensional up- 
dating schemes, so that the possible extra computational overhead associ- 
ated with their implementation always leads to their being suboptimal in 
practice. All this is initially done in the context of target densities consist- 
ing of independent components, and this leads naturally to the question of 
whether this simple picture is altered in any way in the presence of depen- 
dence. Although this is difficult to explore in full generality, we do later 
consider this problem in the context of a class of Gaussian dependent target 
distributions where explicit results can be shown, and where the conclusions 
from the independent component case remain valid. 

It is now well recognized that highly correlated target distributions lead 
to slow mixing for updating schemes where c < 1 (see, e.g., [5, 9]). How- 
ever, it is also known that spherically symmetric proposal distributions in 
(^-dimensions on highly correlated target densities can lead to slow mix- 
ing since the proposal distribution is inappropriately shaped to explore the 
target (see [8]). So for highly correlated target distributions, both high and 
small dimensional updating strategies perform poorly. We shall explore these 
two competing algorithms in a Gaussian context where explicit calculations 
are possible. Our work shows that, for c > 0, for the Metropolis algorithm, 
these two slowing down effects are the same. In particular, this implies that 
the commonly used strategy of getting round high correlation problems by 
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block updating using Metropolis has no justification. In contrast, for MALA 
full dimensional updating, c = 1, is shown to be optimal. 

The paper is structured as follows. In Section 2 we outline the MCMC 
setup. In Sections 3 and 4 we tackle the problem of scaling the variance of 
the proposal distribution for RWM-within-Gibbs (Random walk Metropolis- 
within-Gibbs) and MALA-within-Gibbs (Metropolis adjusted Langevin-within- 
Gibbs), respectively. The approach taken is similar to that used for the full 
RWM/MALA algorithms, by obtaining weak convergence to an appropriate 
Langevin diffusion as the dimension of the state space, d converges to infin- 
ity. The results of Sections 3 and 4 are proved for a sequence of (/-dimensional 
product densities of the form 



for some suitably smooth probability density /(•)■ in both Sections 3 and 4, 
for each fixed, one-dimensional component of {X rf ; d > 1}, the one-dimensional 
process converges weakly to an appropriate Langevin diffusion. The aim 
therefore is to scale the proposal variances so as to maximize the speed of 
the limiting Langevin diffusion. Since each of the components of {X d ; d > 1} 
are independent and identically distributed, we shall prove the results for 



However, it is at least plausible that the picture will be very different when 
considering dependent densities. However, theoretical analysis in the limiting 
case where results can be obtained and in simulations for more general cases, 
we find that the general conclusions which can be derived for densities of 
the form (1.1) extend some way toward dependent densities. To this end, 
in Section 5, we consider RWM/MALA-within-Gibbs for the exchangeable 
normal X d ~ N(0, Ej), where o& = 1, l<i<d, and afj =a<f i = p, 1 < i < 
j < d. (Throughout the paper, we adopt the notation that S will be used 
for variance matrices, while elements of matrices will be denoted by a, both 
conventions using appropriate sub- and super-scripts.) 

All the proofs of the theorems in Sections 3-5 are given in the Appendix. 
Then in Section 6 with the aid of a simulation study we demonstrate that 
the asymptotic results are practically useful for finite d, namely, d > 10. 

2. Algorithms and preliminaries. For RWM/MALA, we are interested 
in (d, <rj), the dimension of the state space, d, and the proposal variance a^, 
where the proposal for the ith. component is given by 



d 



(1.1) 



Mx d )=n/(^) 



i=l 



{Xf;d>l}. 



Yf = xf + a d Z i: 



l<i<d 



RWM 



Yf = xf + a d Zi + -f— logvr d (x d ), 



1< i < d 



MALA 
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and the {Zi}'s are independent and identically distributed according to 
Z~ jV(0,l). For both RWM and MALA, the maximum speed of the dif- 
fusion can be obtained by taking the proposal variance to be of the form 
c-2 = i2 d -s for gome / > o and s > 0. (For RWM, s = 1 and for MALA, s = §.) 

Now for RWM/MALA-within-Gibbs, the basic idea is to choose dcd com- 
ponents at random at each iteration, attempting to update them jointly ac- 
cording to the RWM/MALA mechanism, respectively. We sometimes write 
a d = °"lc d > where Cd represents the proportion of components updated at 
each iteration. Thus, the two algorithms propose new values as follows: 

(2.1) 



Yf = xf + xUd,c d Zi, l<i<d, RWM-within-Gibbs, 



2 

Yf = xf + X f{<r d ,c d Zi + logvr d (x d )|, 

1 < i < d, MALA-within-Gibbs, 

where the {Zi} 7 s are independent and identically distributed according to 
Z ~ A^O, 1) and the {xf} are chosen as follows. Independently of the Zi's, 
we select at random a subset A, say, of size g?q from {1,2, ... ,d}, setting 
xf = 1 if i 6 A, and xf = otherwise. The proposal Y d is then accepted 
according to the usual Metropolis-Hastings acceptance probability: 

(2 . 2) ^ =1 , SgW , 

where q(-, •) is the proposal density. Otherwise, we set = X.f n _ 1 . 

In both cases, the algorithms simulate Markov chains which are reversible 
with respect to tt^, and can be easily shown to be vr^-irreducible and ape- 
riodic. Therefore, both algorithms will converge in total variation distance 
to lid- However, here we shall investigate optimization of the algorithms for 
rapid convergence. To find a manageable framework for assessing optimality, 
Roberts, Gelman and Gilks [6] introduce the notion of the average accep- 
tance rate which measures the steady state proportion of accepted proposals 
for the algorithm, and which can be shown to be closely connected with the 
notion of algorithm efficiency and optimality. Specifically, we define 

7r d (Y d )g(Y d ,X d ) - 

where a\ Cd = l 2 d~ s , X d ~ -K d and Y d represents the subsequent proposal ran- 
dom variable. Thus, a c d d {l) is the Tr^-average acceptance rate of the above al- 
gorithms where we update a proportion Cd of the d components in each itera- 



(2.3) a c /(l)=E nd [a d *(X d ,Y d )]=E. 



1 A 



tion. We adopt the general notational convention that, for any (i-dimensional 

yd 
t,i 



stochastic process W^_, we shall write for the value of its ith component 



at time t. 
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Our aim in this paper is to consider the optimization [in (cd, cr\ c )] of the 
algorithms speed of convergence. For convenience (although to some extent 
this assumption can be relaxed), we shall assume that Cd — > c as d — ► oo for 
some < c < 1. It turns out to be both convenient and practical to express 
many of the optimality solutions in terms of acceptance rate criteria. 



3. RWM-within-Gibbs for IID product densities. We shall first con- 
sider the RWM algorithm applied initially to a simple IID form target 
density. This allows us to obtain explicit asymptotic results for optimal 
high-dimensional algorithms. The results of this section can be seen as an 
extension of the results of Theorems 1.1 and 1.2 of [6] which considers the 
full-dimensional update case. 

Let 



(3.1) 



7Td(x d ) = f[f(xf) = f[e W {g(4)} 

i=l i=l 



be a (i-dimensional product density with respect to Lebesgue measure. Let 



the proposal standard deviation ad 



for some I > 0. 



For d > 1, let U* = (Xf^, Xf M]>2 , . . . , Xf M]4 ) 
Let U* = U* V 



, and so, U i 



X( dt]ti ,l<i<d. 



Theorem 3.1. Suppose that f is positive, C 3 (a three-times differen- 
tiable function with continuous third derivative) and that (log/)' = g' is 
Lipschitz. Suppose also that, Cd^ c, as <i— > oo, for some < c < 1, 

8i 



(3.2) 

and 

(3.3) 



E 



\f(X)J 



E 



f(x)J 



h 



< OO 



< OO. 



Let Xg° = (Xl 



11^-0,2) ' 



) be such that all of its components are distributed 
according to f and assume that Xq ^ = Xq i for all i < j. Then, as d — ► oo, 

(3.4) U d U, 

where Uq is distributed according to f and U satisfies the Langevin SDE 

(3.5) du t = (M0) 1/2 dB t + \h c (i)g'{u t ) dt 

and 

h c (l) = 2cl 2 $' 



6 P. NEAL AND G. ROBERTS 

with $ being the standard normal cumulative c.d.f and 



I=E / 



ffVOX 

\f(X)J 



The following corollary holds. 

Corollary 3.2. Let q — > c, as d — > oo, for some < c < 1. Then: 

(i) lim d ^ 00 a c /(l) = a c (l) d M 2 $>{-H£). 

(ii) Let I be the unique value of I which maximizes h\{l) = 2l 2 ^(—^-) on 
[0,oo), and let l c be the unique value of I which maximizes h c (l) on [0,oo). 
Then l c = c~ x l 2 l and h c (l c ) = h\(l). 

(iii) For all < c < 1, the optimal acceptance rate a c (l c ) = 0.234 (to three 
decimal places). 

Though these results involve fairly technical mathematical statements, 
they yield a very simple practical conclusion. Optimal efficiency obtainable 
for a given c does not depend on c at all. Now, in practice, computational 
overheads associated with one iteration of the algorithm are nondecreasing as 
a function of c, so that, in practice, smaller values of c should be preferred. 
Therefore, for RWM, using high-dimensional update steps does not make 
any sense. 

It is, of course, important to see how these conclusions extend to more 
general target densities and, in particular, ones which exhibit dependence 
structure. Some theory and related simulation studies in Sections 5 and 6, re- 
spectively, will demonstrate that these findings extend considerably beyond 
the rigorous but restrictive set up of Theorem 3.1. 



4. MALA-within-Gibbs for IID product densities. We now turn our at- 
tentions to MALA-within-Gibbs. We again consider a sequence of proba- 
bility densities tx^ of the form given in (3.1). We follow [7] in making the 
following assumptions. We assume that Xq is distributed according to the 
stationary measure 7r^, g is an eight times continuously differentiable func- 
tion with derivatives gW satisfying 

(4.1) \g(x)\,\gV(x)\<C(l + \x\ K ), 
1 < i < 8, for some C, K > 0, and that 

(4.2) / x k f(x)dx<oo, A; = 1,2,.... 

Jm. 

Finally, we assume that g' is Lipschitz. This ensures that {Xj} is nonexplo- 
sive (see, e.g., [12], Chapter V, Theorem 52.1). 
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Let { Jf} be a Poisson process with rate d 1 / 3 and let T d = {Tf }t>o be the 
d-dimensional jump process defined by Tf = Xj , where we take a\ = l 2 d~ 1 ^ 
with I an arbitrary constant. 

We then have the following two theorems which are extensions of [7], 
Theorems 1 and 2. 



Theorem 4.1. Suppose that Cd^> c, as d — > oo, for some < c < 1. We 
have that 

wl thK*=n * 9 ''' {x)2 ^ 9 '' {xf \>Q. 

Theorem 4.2. Suppose that Q — > c as d— > oo /or some < c < 1. iet 
{£^ rf }t>o fre ^ e process corresponding to the first component ofF d . Then, as 
d— >oo, the process U d converges weakly {in the Skorokhod topology) to the 
Langevin diffusion U defined by 

dU t = h c {l) 1/2 dB t + \h c {l)g {U t ) dt, 
where h c (l) = 2c/ 2( ^(— ^ 2 — ) is the speed of the limiting diffusion. 

The most important consequence of Theorems 4.1 and 4.2 is the following 
corollary. 

Corollary 4.3. Let c^^ c, as d^ oo, for some < c < 1. Then: 

(i) Let 1 be the unique value of I which maximizes h\(l) = 2/ 2 $(— l -rf) 
on [0, oo), and let l c be the unique value of I which maximizes h c (l) on [0, oo). 
Then i c = c~ l /H and h c (l c ) = c 2 / 3 /ii(f). 

(ii) For all < c< 1, the optimal acceptance rate a c (l c ) = 0.574 (to three 
decimal places). 

Thus, in stark contrast to the RWM case, it is optimal to update all com- 
ponents at once for MALA. The story is somewhat more complicated in the 
case where computational overheads are taken into account. For instance, 
it is common for the computational costs of implementing MALA-within- 
Gibbs to be approximately d(a + be) for constants a and b. To see this, note 
that the algorithm's computational cost is often dominated by two oper- 
ations: the calculation of the various derivatives needed to propose a new 
value, and the evaluation of it at the proposed new value. The first of these 
operations involves a co!-dimensional update and typically takes a time which 
is order cd, while the second involves evaluating a d-dimensional function 



<s 
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which we would expect to be at least of order d. (Although, in some impor- 
tant special cases, target density ratios might be computed more efficiently 
than this.) In this case the overall efficiency is obtained by maximizing 



This expression is maximized at 1 A 2a/b. Therefore, it is conceivable for 
full dimensional updates to be optimal even when computational costs are 
taken into account. In any case, the optimal proportion will be some value 



5. RWM/MALA-within-Gibbs on dependent target distributions. We 

are now interested in the extent to which the results of the last two sections 
can be extended to the case where the d components are dependent. It is 
difficult to get general results, but certain important special cases can be ex- 
amined explicitly, yielding interesting results which imply (essentially) that 
the extent by which the dependence structure affects the mixing properties 
of the chain (RWM-within-Gibbs or MALA-within-Gibbs) is independent 
of c. The most tractable special case is the Gaussian target distribution. 
However, in Section 6, we shall also include some simulations in other cases 
to show that the above statement holds well beyond the cases for which 
rigorous mathematical results can be proved. 

We begin with RWM-within-Gibbs and consider the optimal scaling prob- 
lem of the variance of the proposal distribution for a target distribution 
consisting of exchangeable normal components. Specifically, X rf ~ Nd(0, S^), 
where ofj = 1, 1 < i < d, and afj = p, i / j, for some < p < 1. Therefore, 
we have that 



c 2/3 



a + bc 



x* G (0,1]. 



vr d (x d ) = (2^det|S^ 



rfi— 1/2 




x exp 





say, 



where 



0, 



-p 



'd = 



l + (d-2)p-(d-l)p' 2 



and 
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For d > 1, U* = (X( dt]jl ,X( dtl2 , Xf dt]4 ). Let XJ d = (U d 1: U t % U t d 3 ) be such 

that Ufa = U t % U d 2 = U d 2 and U d 3 = ^ Lis 
Now the proposal Y d is given by 

Y d = x d + a d xfZ u l<i<d, 

where the Zi and xf (1 < i < d) are defined as before and er^ — 



d-2 



for 



some constant I. [We use (d — 2) rather than d or (d — 1) for simplicity in 
presentation of the results.] 

In the dependent case, more care needs to be taken in constructing the 
sequence {Xq; d > 1}. Let Xq ~ N(0, 1) [i.e., Xq is distributed according to 
7Ti(-)]. For d > 2 and 1 < i < d — 1, set X d i = X l 0i . Then iteratively define 

d-l 



•v-d 

X 0,d ~ 



1 



d 



1 



d-l 



(l + (d-2)p-(d-l)p 2 ) 



Therefore, Xq is distributed according to ir d {') an d we can continue this 
process indefinitely to obtain Xq° = (Xq 1 , Xq 2 , • • •)• 

Theorem 5.1. Suppose that < p < 1 and £/ia£ — > c, as d — > oo, /or 
some < c < 1. Let Xq° = (Xq 1 ,Xq 2 , • ■ •) be constructed as above. Let 

1 p p v 



Do 



D 3 




l-p 
1 

" W 
i + P 



P P(i-P)/ 



-Lei /(u) denote the probability density function of N(0,D\). Then, as d- 
oo, 

U d ^U, 

where Uo is distributed according to f and U satisfies the Langevin SDE 
dU t = (h c , p (l)) 1 ^D 3 dB t + h c , p (l)D 3 {^V(-^VfD 2 V t )}dt, 

where 

. ... n / I 
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Note that if we define I d = E[(-^-j d (X. d )) 2 } and / = Then I d -► / 

as d — > oo and h CjP (l) = 2cl 2 ^(—^\/cf). Therefore, the speed of the limiting 
diffusion for exchangeable normal has the same form as that obtained for 
the IID product densities considered in Section 3. 

As in (2.3), let a c d d,p (l) be the ^-average acceptance rate of the above 
algorithm where X d ~ N(0, E^), a d = ^= = and we update a proportion c d 
of the d components in each iteration. Then we have the following corollary. 

Corollary 5.2. Let Cd^> c, as d — > oo, for some < c < 1. Then, for 
0<p<l: 

(i) linid^oo af p {l) = a^{l) ^ 2*(- 1 fif p ) . 

(ii) Let I be the unique value of I which maximizes hi$(l) = 2/ 2 $(— i) 

on [0, oo), and Zei l cp be the unique value of I which maximizes h cp (l) on 

[0,oo). Then f C)P = y^^Z and h CtP (i CtP ) = (1 - p)h lfi (l). 

(hi) For a// < c < 1 and < p < 1, i/te optimal acceptance rate a c ' p (l CtP ) = 
0.234 (to t/iree decimal places) . 

Note that Corollary 5.2(h) states that the cost incurred by having afj = p, 
i j, rather than afj = 0, i ^ j, is to slow down the speed of the limiting 
diffusion by a factor of 1 — p, for all < c < 1. In other words, the cost 
incurred by the dependence between the components of X d is independent 
of c. Furthermore, the optimal acceptance rate a c ' p (l CtP ) is unaffected by the 
introduction of dependence. We shall study this further in the simulation 
study conducted in Section 6. 

Note that in Theorem 5.1 the last row of the matrix D% is a row of zeros. 
This implies that the mixing time of l T X d grows more rapidly than 0(d) 
as d — > oo. In [8], heuristic arguments and extensive simulations show that 
the mixing time of l^X^ is in fact 0(d 2 ). Theorem 5.3 below gives a formal 
statement of this result. (The proof of Theorem 5.3 is similar to the proof 
of Theorem 5.1 and is, hence, omitted.) 

Ford>l,let^ = ^Ef=3*[W 

Theorem 5.3. Suppose that < p < 1 and that q — > c, as d^ oo, for 
some < c < 1. Let Xq° = (Xq 1? Xq 2 , • • •) be constructed as in the prelude 
to Theorem 5.1. Then, as d^oo, 

where Uq ~ N(0,p) and U satisfies the Langevin SDE 

dU t = (h c , p (l)) 1 / 2 dB t + h c , p (l)\-^-U t \dt, 
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where h CtP (l) = 2cl 2 §(—^J^—), as before 



We now turn our attention to MALA-within-Gibbs for the exchangeable 
normal. So that now the proposal Y d is given by 

Yf = xf + xfL d Z i + 4(-T^—4 




1-p 

where we take a\ = £ 2 d -1 / 3 with I an arbitrary constant. Let Xq° be con- 
structed as outlined above for the RWM-within-Gibbs. Let { Jt} be a Poisson 
process with rate d 1 ^ 3 and let T d = {r^}t>o be the <i-dimensional jump pro- 
cess defined by V d = X^. Let = {U^ x ~U^ 2l Uf 3 ) be such that Uf A = Tf i, 

u d 2 = r d 2 and^3 = ^Et3r M . 

Theorem 5.4. Suppose that < p < 1 and that Cd^> c, as d — > oo, for 
some < c < 1. Let Xq° = (Xq x ,Xq 2 , ■ ■■) be constructed as in the prelude 
to Theorem 5.1. Let D±, D 2 , and f be as defined in Theorem 5.1. Then, 
as d — > oo, 

U d ^U, 

where Uo is distributed according to f and U satisfies the Langevin SDE 
dU t = (h c , p (l)) l / 2 Dz dB t + h c , p (l)D 3 {±V(-±XJjD 2 lJ t )} dt, 

where 

h CyP {l) = 2d 2 ^ 
is the speed of the limiting diffusion. 



Note that if we define 

1 {J d 3 .,„^\ 2 / a 3 



Kj = E 



48 1 V^i J \dxj 



and K 2 = jsij^) 3 , then K 2 d -> K 2 as d^ oo and h c , p (l) = 2cl 2 <S>{-!j^/c~K). 
Therefore, the speed of the limiting diffusion for exchangeable normal has 
the same form as that obtained for the IID product densities considered in 
Section 4. 

As in (2.3), let a c d d,p (l) be the ^-average acceptance rate of the above 
algorithm where X rf ~ N(0, = Id^ 1 ^ and we update a proportion 

of the d components in each iteration. Then we have the following corollary. 
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Corollary 5.5. Let Cd—>c, as d — > oo, for some < c < 1. Then, for 
0<p<l: 

(i) lim^oo o5^(I) = c^(0 = 7 2*(- f ) • 

(ii) Let 1 be the unique value of I which maximizes h\fi(V) = 2/ 2< l>(— ^) 
on [0, oo), and let l cp be the unique value of I which maximizes h cp (l) on 
[0,oo). Then l CiP = ^T^pc^lH and h C}P (l C}P ) = c 2 / 3 (l - p)h 1)0 (t). 

(iii) For all < c < 1 and < p < 1, the optimal acceptance rate a c ' p (l C)P ) = 
0.574 (to three decimal places) . 

Note that Corollary 5.5(ii) states that the cost incurred by having afj = p, 
i 7^ j, rather than afj = 0, i ^ j, is to slow down the speed of the limiting 
diffusion by a factor of 1 — p, for all < c < 1. Therefore, the dependence in 
the target distribution TTd(') affects convergence of the MALA-within-Gibbs 
in the same way that it affects the RWM-within-Gibbs. The cost associated 
with updating only a proportion c rather than all of the components is the 
same as that observed in Section 4. Furthermore, the optimal acceptance 
rate a c ' p (l CjP ) is unaffected by the introduction of dependence. 

From Theorem 5.4, we see that the mixing time of l T X d is greater than 
0(d 1 ^ 3 ) as d— > oo. In fact, the mixing time of l T X d is in fact 0(d^ 3 ). Let 
{ Jt} be a Poisson process with rate d 4 / 3 and for d > 1, let Uf = ^ £jL, Xj t . 

Theorem 5.6. Suppose that < p < 1 and that Cd^ c, as d — > oo, for 
some < c < 1. Let Xg° = (A^, A"q 2 , . . .) be constructed as in the prelude 
to Theorem 5.1. Then, as d—>oo, 

where Uq ~ N(0,p) and U satisfies the Langevin SDE 

ddt = (KA l )) 1/2 dB t + M0{-^t} dt, 

where h C)P (l) = 2d 2 $(-^y^ T ^p- ), as before. 

The proofs of Theorems 5.4 and 5.6 are hybrids of those for the results of 
Section 4, and for Theorems 5.1 and 5.3 above, and are, hence, omitted. 

6. A simulation study. The rotational symmetry of the Gaussian distri- 
bution effectively allows the dependence problem to be formulated as one of 
heterogeneity of scale. Other distributional forms exist for which this may 
be possible (e.g., the multivariate t-distribution), but it seems difficult to 
derive results for very general distributional families of target distribution 
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without resorting to ideas such as this. Therefore, to support the conjecture 
that the conclusions of Sections 3-5 hold beyond the rigorous, theoretical 
results, we present the following simulation study. Furthermore, we demon- 
strate that the asymptotic results are achieved in relatively low dimensional 
[d > 10) situations. 

Throughout the simulation study we measure speed/efficiency of the al- 
gorithm by considering first-order efficiency. That is, for a multidimensional 
Markov chain X with first component X 1 , say, the first-order efficiency 
is defined to be dE[(X£ +1 - Xl) 2 } for RWM and d^ 3 E[(X} +1 - X}) 2 } for 
MALA, where X( is assumed to be stationary. For each of the target distri- 
butions and different choices of c and d, we consider 50 different proposal 
variances, a 2 , c . For each choice of proposal variance a\ c , we started with Xo 
drawn from the target distribution. We then ran the algorithm for 100000 
iterations. We estimate R[(X} +1 - X}) 2 ) by Ei™^ 1 - X}_ ± ) 2 and 

the acceptance rate is estimated by 100 1 Q00 Ei£i°° l{x^Xi_i}- We then plot 
acceptance rate against dE[(X^ +1 — X}) 2 ] (first-order efficiency). 

We begin by considering RWM-within-Gibbs. We shall consider three 
different target distributions ~ N(0, S^), tt^ ~ t 50 (0,S^) and vr d (x d ) = 
nf=i \ x exp(— \xf |) (double-sided exponential). Note that the distributions 
i5o(0,E^) (p > 0) and the double-sided exponential are not covered by the 
asymptotic results of Sections 3 and 5. For the iV(0,X^) and i5o(0,£p), we 
plot acceptance rate against the normalized first-order efficiency, j^E[(X 4 1 + i 

Xl) 2 \. The normalization is introduced to take account of dependence (see 
Corollary 5.2). 

Figures 1 and 2 give a representative sample of the simulation study we 
conducted for a whole range of different values of c, d and p. The results are 
as one would expect. In all cases the estimated optimal acceptance rate is 
approximately 0.234. As can be seen from Figures 1 and 2, the normalized 
first-order efficiency curves are virtually indistinguishable from one another 
for each choice of c, d and p. Therefore, we have made no attempt to differ- 
entiate between the different efficiency curves. 

(Note that the results in Figure 3 are a representative sample from a much 
larger simulation study.) 

Figures 3 and 4 produce results in line with those expected from Sections 
3 and 5. This demonstrates that the conclusions of Sections 3 and 5 do extend 
beyond those target distributions for which rigorous statements have been 
made. 

We now turn our attention to MALA-within-Gibbs. We shall consider in 
our simulation study only target densities of the form ir^ ~ A r (0, S^). 

Simulations in Figures 5 and 6 show excellent agreement with Corollaries 
4.3 and 5.5. Again, the results demonstrate the usefulness/relevance of the 
asymptotic results for even fairly small d. 
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7. Discussion. A rather surprising property of high-dimensional Metropo- 
lis and Langevin algorithms is the robustness of relative efficiency as a func- 
tion of acceptance rate. In particular, the optimal acceptance rates 0.234 
and 0.574 for Metropolis and Langevin, respectively, appear to be robust to 
many kinds of perturbation of the target density. A remarkable conclusion 
of this paper is this apparent robustness of relative efficiency, as a function 
of acceptance rate, seems to extend quite readily to updating schemes where 
only a fixed proportion of components are updated at once. 

A further unexpected conclusion concerns the issue of optimization in 
c. Here, very clear cut statements appear to be available, with smaller- 
dimensional updates seeming to be optimal for the Metropolis algorithm (as 
seen from Theorem 3.1 and Corollary 3.2), whereas higher-dimensional up- 




0.2 0.4 0.6 0.3 1.0 

Acceptance Bals 

Fig. 1. Normalized first-order efficiency of RWM -within- Gibbs, ^-E[(Xl +1 — X}) 2 }, as 
a function of overall acceptance rates for each combination of (d = 20; c = 0.25, 0.5, 0.75, 1; 
p = 0,0.5), with ltd ~ -/V(0, Sp) . 
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0.0 0?. 4 0.6 0.3 1.0 

Acceptance Fuale 

Fig. 2. Normalized first-order efficiency of RWM-within-Gibbs, ^-E[(Xl +1 - Xj) 2 ], 
as a function of overall acceptance rates for each combination (c = 0.5; d = 10, 20, 50; 
p = 0,0.5), with-K d ~ N(0,a*). 

dates are to be preferred (at least before computing time has been taken into 
consideration) for MALA schemes (see Theorem 4.2 and Corollary 4.3). The 
robustness of these conclusions to dependence in the target density is seen 
in the results of Section 5 and, supported by the simulation study in Section 
6, seems contrary to the general intuition that "block updating" improves 
MCMC mixing (at least for the Metropolis results). However, our results 
show that this intuition is only correct for schemes where the multivariate 
update step utilies the structure of the target density (as, e.g., in the Gibbs 
sampler, or, to a lesser extent, MALA). 

We believe that these results should have quite fundamental implications 
for practical MCMC use, although, of course, they should be treated with 
care since they are only asymptotic. Our results have been shown in the 
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1 1 1 1 1 T 



0.0 0.2 (M 0.6 0.S 1.0 

Aecaplancs Role 

Fig. 3. Normalized first-order efficiency of RWM-within-Gibbs, j^~E[(Xt +1 - Xj ) 2 ], 
as a function of overall acceptance rates for each combination: (i) (d = 20; 
c = 0.25,0.5,0.75,1; p = 0,0.5) and (ii) (c = 0.5; d= 10,20,50; p= 0,0.5), itfitft 

simulation study to hold approximately in very low-dimensional problems — 
although the speed at which the infinite-dimensional limit is reached does 
vary in a complicated way, in particular, in c and measures of dependence 
in the target density (such as p in the exchangeable normal examples). 

The results for the exchangeable normal example show that certain func- 
tions can converge at different rates to others (X converging at rate d 2 , while 
Xi — X converges at rate d) , and this can cause serious practical problems 
for the MCMC practitioner. In particular, any one co-ordinate Xi might con- 
verge rapidly, in a given time scale, to the wrong target density. Certainly, 
it would be extremely difficult to detect such problems empirically. 
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Acceptance Rule 

Fig. 4. Normalized first-order efficiency of RWM -within- Gibbs, j^~E[(Xl +1 — X}) 2 ], as 
a function of overall acceptance rates for each combination (d = 40; c = 0.25, 0.5, 0.75, 1), 
withn d (x d ) = U d i=1 ±exp(-\xt\). 

The results in this paper are given for Metropolis and MALA algorithms. 
However, the use of these two methods is, in some sense, illustrative, and 
other algorithms (such as, e.g., higher-order Langevin algorithms using, e.g., 
the Ozaki discretization [10]) are expected to yield similar conclusions. 

APPENDIX 

A.l. Proofs of Section 3. Theorem 3.1 implies that the first component 
acts independently of all others as d — > oo. Intuitively, this occurs because all 
other (d — 1) terms contribute expressions to the accept/reject ratio which 
turn out to obey SLLN and, thus, can be replaced by their deterministic 
limits. To make this idea rigorous, we need to define a set in R rf on which 
the first component is well approximated by the appropriate LLN limit. 
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1 1 1 1 1 T 



c.q flj o.£ a a 1.0 

Fig. 5. Normalized first-order efficiency of RWM-within-Gibbs, c~ 2//3 j^d 1 ^ 3 x 
E[(A t 1 +1 — Xt) 2 ], as a function of overall acceptance rates for each combination of (d = 20; 
c = 0.25, 0.5, 0.75,1; p = 0,0.5), with n d ~ iV(0, EjJ). 



Motivated by this idea, we construct sets of tolerances around average values 
for quantities which will appear in the accept/reject ratio. Thus, we define 
the sequence of sets {Fd C M. d , d > 1} by 



F d = x d ; 



1 



i=2 



"x?) 2 -I 



-^a" (4) + 



i=2 



(d 



1 d 

' i=2 



<^ 1 /8| 

<d- 1 /s| 

<^ 1/8 1, 
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0.6 



0.9 



1.0 



Fig. 6. Normalized first-order efficiency of MALA-within-Gibbs, c 2/73 j^d 1 ^ 3 x 
E[(A t 1 +1 — X}) 2 ], as a function of overall acceptance rates for each combination (c= 0.5; 
d= 10, 20, 50; p = 0,0.5), with n d ~ JV(0, Ep). 



d,2 



say, 



where I is defined in Theorem 3.1. Let x°° = 
x d = (xf , xf, . . . , x^), where, for 1 < i < d, xf 
xi interchangeably, as appropriate. 



(xi,X2, ■ ■ ■) and for d > 1, let 
: Xj. Thus, we shall use xf and 



Lemma A. 1. For k = 1,2,3 and t > 0, 
(A.l) P(Uf 6%0<s<t)^l asd^cx) 

and, hence, 

F(\J d s eF d ,0<s<t)^l asd^oo. 
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Proof. The cases k = 1 and k = 2 are proved in [6], Lemma 2.1. The 
case k = 3 is proved similarly using Markov's inequality and (3.2). The 
lemma then follows. □ 



For any random variable X and for any subset A C R, let E* [X] = E[X|xf = 
1] and F*(X e A) =F(X e A\xf = l). 

Let Gd be the (discrete-time) generator of X rf , and let V G C,? (the space 
of infinitely differentiable functions on compact support) be an arbitrary 
test function of the first component only. Thus, 



G d V(x d ) = dE 



(A.2) 



(V(Y d ) - V (**)){ 1 A 



dF(xi = 1)E* 



7T d (x d ) 



(F(YV^)) 1A 



7T d (Y d ) 

7r d (x d ) 



since Y d = xf if xj = 0. 

The generator G of the one-dimensional diffusion described in (3.4), for 
an arbitrary test function V G , is given by 



(A.3) GV{ Xl ) = 2cl 2 <S> 



iVcI 



1 -g'{x 1 )V\x 1 ) + l -V"{ Xl ) 



(Note that, under the conditions imposed in Theorem 3.1, C£° forms a core 
for the full generator.) By Lemma A.l, we can restrict attention to x d G F^. 
The aim will therefore be to show that, for all x d G Fd, 

G d V{-x. d ) ->GV(xi) asci^oo. 

The proof of Theorem 3.1 will then be fairly straightforward. 

Thus, we begin by giving a Taylor series approximation for GdV(x. d ) in 
Lemma A.3, for which we will require the following lemma. 



Lemma A.2. For any V G (the space of infinitely differentiable func- 
tions on compact support), 

(A.4) sup \dE*[(V(Y 1 d )-V(xf))]-±l 2 V"(x 1 )\^0 as d^oo 
and 

(A.5) sup \o- d dE*[Z 1 (V(Yf)-V(xf))}-l 2 V'(xi)\^0 asd^oo, 
x d eF d 

with x±=xf. 
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PROOF. For xf — 1) 



1 Xl 

Thus, by Taylor's theorem, 

V(Y d )-V(xf) = V\x d )(a d Z 1 ) 



I 



Zi. 



(A.6) 



+ \V"(x{){a d Z x f + \V"'{W 1 ){a d Z l f 



for some W\ lying between xf and Y d . 

The lemma then follows by substituting (A.6) into the left-hand sides of 
(A.4) and (A.5). □ 

Lemma A. 3. Let 
G d V(x. d ) = ±c/V(xi)E*[l A e Bd ] + cl 2 V'(x 1 )g'{x 1 )E*[e Bd ;B d < 0], 
where B d (= B d (x. d )) = T,f =2 (g(Yf) - g(xf)). Then, we have that 
(A.7) sup \G d V(x. d )-G d V(x d )\^0 asd^oo. 

x d GF d 

Proof. Decomposing Y d into (Y±,Y d ~) and using independence gives 



G d V(x d ) = dc d E\ 



(V(Y d )-V(x d ))M* Yd _ 



/Of) 



i /K a 



We shall begin by concentrating on the inner expectation, by recalling the 
following fact noted in [2]. Let /ibea twice differentiable function on R, then 
the function z *—> 1 A e h ^ z > is also twice differentiable, except at a countable 
number of points, with first derivative given Lebesgue almost everywhere by 
the function 



dz 



h'(z)e h ( z \ 
0, 



if h(z) < 0, 
if h(z) > 0. 

Now take h d (z){= h d (z;x. d )) = (g(xf + a d z) — g(xf)) + B d and let 



yd- 



1A]J 



f(Yf) 



Thus, "f d (z) = E^d- [1 A e^ 2 )], and so, for almost every xf £ R, there exists 
W lying between and z such that 

7 ^)=E^_[lAe^ )] 
(A.8) + zE* Yd _ [a d g'(xj)e h ^;h d (0) < 0] 

-.2 



+ jE* Yd . [^(/(^i + a^) + g'(x d + ^W) 2 )^^ ^(W) < 0]. 
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The key results to note are that h d (0) = B d and that, conditional upon 
Xi = 1, Y\ and Y d ~ are independent. Therefore, 

G d V(x d ) 
= dc d E 



y i 



{V{Y d )-V{x d )) 

+ Z{E^ d - [a d g'(xi)e h ^;h d (0) < 0] 
+ ^E^ d .[a 2 d (g"(xt + a d W) 

+ g'(x d 1 +a d W) 2 )e h *W ] h d (W)<0] 

= dc d E*[{V(Y d ) -V(x 1 ))]E*[l Ae^] 
(A.9) +<7 / (xi)(i Cd a d E*[(y(y 1 d )-y(xf))Z 1 ]E*[ e Bd ; J B fi <0] 



+ & d E; 



5 j d 



xE\ d _[o 2 d (g"(x d + a d W) 

+ g'(xf + a d W) 2 )e h ^;h d (W)<0] 

= G d V(x d )+D d (x d ;Z 1 ;W), say. 

Since E*[l A e Bd ],E*[e Bd ; B d < 0] < 1 and x\ = xf, it follows from Lemma 
A.2 that 

sup \G d V{x. d ) - G d V(x d )\ -+0 asci^oo. 

Thus, to prove the lemma, it is sufficient to show that, for all x d G F d , 
D d (-x. d ; Z\;W) converges to 0, as d — > oo. 
By Taylor's theorem, we have that 



(V(Y d )-V(xf))^ 



< sup \V'{a x )\^\Zl\ 

ai£R ^ 



and 



\g'{x d + a d W)\ < \g'(x d )\+a d \W\ sup | 5 "(a 2 )| 

a 2 GK 

<|^)| + o-d|^i| sup |</'(a 2 )|. 
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Since V and g" are bounded functions, it follows that, for all x rf £ F d , 
D d (x d ; Z i; W)< dc d {'lKa 3 d (K + \g'(xf)\ + a d K)} - as d - oo, 
for some X > 0, and the lemma is proved. □ 

Lemma A. 3 states that, for all x^Fj, the generator G d can be approxi- 
mated by the generator G d which resembles the limiting generator G. Thus, 
we now need to consider for all x d G F d , E* [1 A e Bd ) and E* [e Bd ; B d < 0] . The 
aim is to approximate B d by a more convenient quantity A d (to be defined 
in Lemma A. 6) and, hence, show that 



E*[lAe Bj ]^2$ 

and 



iVcI 



E*[e Bd ;B d <0}^$(^- 1 -^-^ asd^oo. 

This will be done in the following lemmas. 

Lemma A.4. Let \ d (= A d (x d )) = ^ £-= 2 xfg'(xf) 2 . For any e > 0, 
sup P*(|A d - cl\ > e) -> asd^co. 

Proof. Let i? d (= R d (x d )) = d ^ T Ef=2 9 , (xf) 2 . Then, for x d G F d , 

I Ad - cJ| < | ^ - E*[A d ]| + |E*[A d ] - cR d \ + |ci? d - cl|. 

Note that E*[A d ] = ^J 1 ^ 1 R d , and so, by Lemma A.l, we have that 

|E*[A d ] - cR d \ + \cR d - cl\ -> as d — > oo. 

Therefore, to prove the lemma, it suffices to show that, for any e > 0, 
P*(|A d -E*[A d ]| >e)^0 as d->oo. Note that 

I d d 



x * = Tjzr^ E E x? xJs'Otf) Y(*J) 

^ a ^ i=2j=2 

and so, 



i=2 



d 



+ Wi^rSg 9 ' (I?>V( ^ )2 . 
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_ (c d d- l)(c d d- 2) 2 

(d-l)(d-2) d 

d 



(c rf d - 1)(1 -c rf )d f 1 x ^ , 
(d-l)(d-2) 



Then since sup x d g ^ d | ^zfp J2i=2 9'( x f ) 4 | — * an d c,i — > c as d — > oo, it fol- 
lows that, for all x d G F d , E*[(A d - E*[A d ]) 2 ] — > as d — > oo and, hence, by 
Chebyshev's inequality, 

sup P*(|A d -E*[A d ]| >e) -+0 asd^oo, 

as required. □ 
Lemma A. 5. Let 

W d (= W d {* d )) ^[^"(xlXYf - xjf + ^-^g'ixf ) 2 }, 

and q — > c as d — > oo . Then, recalling that a d = 1/ \fd, 
sup E*[|W d |] -+0 as d^oo. 

Proof. First, note that E*[|W d |] 2 < E*[W|]. 
Then, by direct calculations, 



( V* 1 n"^2 \ Cdd-\ ^ (c d d - l)(c d d - 2) 4 

d d 



(d- l)(d-2) 
V^l 4 ' (d-l)(d-2) ^ 



+ 77^W(*?)V(^) 2 



4(d-l) 

Wd,i + W d , 2 , say. 
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Let W d , 3 (= W d ^ d )) = {-^YLiWirf) + 9'(xf) 2 )} 2 , and since c d -> c 
as d — > oo, we have that 

sup | W dt2 - W dj3 \ as d ->• oo. 

However, by definition, sup x d g ^ d | W^l — > and since </' is bounded, 
sup x d gi r d |Wd,i| — > as d — > oo. The lemma follows immediately. □ 

Lemma A.6. Let A d {= A d {* d )) = Y? i=2 {g'{xi){Y?-xi) - ^g>( x f) 2 }. 
Then, 

(A.10) sup |E*[1 Ae Ad ] -E*[l Ae fld ]|->0 asd^oo 

and 

(A.ll) sup |E*[e Ad ;A d <0] -E*[e Bd ;B d <0]\ ^0 asd^oo. 

Proof. Note that 
B d = it(9(Yf)-g(xt)) 

= J2{g'(xl)(Yf - 4) + y'(xf)(Yf - xjf + y"(af)(Yf - xff}, 

i=2 

for some af lying between xf and Therefore, by [6], Proposition 2.2, 
|E*[1 Ae Ad ] -E*[l Ae Bd ]| 

(A.12) <E*[|Wd|]+sup|/'(a)|^E*[|l^-^| 3 ], 

= E* [| + sup |g w (a) | d = 1 ° d f - 1 ggE[| Z t | 3 ] ■ 

Now let <p d = S u VMd {W[\W d \] +su PaeR \f {a^^a^Z^]}, where W d 
is defined in Lemma A. 5. Then, since g'" is a bounded function, it follows 
from Lemma A. 5 that tp d — ► as d — ► oo and so (A. 10) is proved. 

Let J d {= J d (x d )) = (e Ad ;A d < 0) - (e B *;B d < 0) and let 5 d = ^ d . Then 
we proceed by showing that 

sup P*(| J d \ > 6 d ) — ► as d — ► oo. 

Note that, if A d , Z? rf > 0, then 

|Jd|=0<|Ai--Bd| 
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and if A d , B d < 0, then 

\J d \ = \ exp(,4 d ) - exp(B d )\ < \ A d - B d \. 
Therefore, it follows that 

(A.13) P*(| J d \ > S d ) < F*(-8 d <A d < S d )+¥*(\A d - B d \ > 
By Markov's inequality, 

r{\A d -B d \>5 d ) 



(A.14) 



<^E*[\A d -B d \ 
o d 



1 



<f ^[\W d \]+sMg'\a)\^^[\Y 2 -x 2 \*] 
o d I aSM o 



and so, P*(| A rf - B d \ > 5 d ) —> as d —> oo, uniformly for x d G F d . 
Fix x d G Frf, then for any e > 0, by Lemma A. 4, 



(A.15) P* 
Hence, 



E 1 



cZ 2 



cl 2 

±<5 d + -y #d 



> e I — > as <i — > oo. 



as d — > oo. 



Thus, 
(A.16) 



sup P*(-<5 d < A d < 5 d ) ->• as (i ^ oo. 



Therefore, by (A.13)~(A.16), sup x d eFd P*(| J d | > £ d ) -► as d^oo. Then 
since \ J d \ < 1, it follows that sup x d g ^ d E* [J d ] — > as d— > oo and so (A. 11) is 
proved. □ 



Lemma A. 7. 
(A. 17) sup 

and 



E*[l Ae Ad l -2$ 



(A.18) sup 



E*[e j4d ;A d <0] - $ 



as d — ► oo 



as d^ oo. 
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Proof. Since A d ~ N(-^-R d , l 2 Xd), it follows by [6], Proposition 2.4, 
that 



+ exp(~(cR d -\ d ))$(-lV>r d + 



E*[l Ae Ad ] =E* 

(A.19) 

Since for any x d G and e > 0, ¥*(\R d - I\ > e) -»■ and P*(|A d - c/| > 
e) — ► as oo, (A. 17) follows from (A.19). 
(A. 18) is proved similarly. □ 

We are now in a position to show that, for all x rf £ F^, the generator G d 
converges to the generator G as d — > oo . 

Theorem A. 8. For V G , 

sup \G d V{x. d )-GV( Xl )\^0 asd^oo. 
x d eF d 

PROOF. By Lemma A. 3, 

sup \G d V(x. d )-GV(x d )\ -+0 asd^oo, 

and by Lemmas A. 6 and A. 7, 

sup \G d V{x d )-GV{x{)\^Q asd^oo. 

x d £F d 

Thus, the theorem is proved. □ 

Proof of Theorem 3.1. The proof is similar to that of [6]. From 
Lemmas A.l, A. 4 and Theorem A. 8, we have uniform convergence of G d V 
to GV for vectors contained in a set of ir measure arbitrarily close to 1. Since 
C£° separates points (see [4], page 113), the result will follow by [4], Chapter 
4, Corollary 8.7 if we can demonstrate the compact containment condition, 
which in our case follows from the following statement. For all e > 0, and all 
real valued U$ = X$ i , we can find K > sufficiently large with 

F(UU(-K,K), 0<t<l)<e, 

for all d. We appeal directly to the explicit form of the Metropolis transitions 
and assume that the Lipshitz constant for g is termed b. Thus, the following 
estimates are easy to derive by just noting that squared jumping distances 
are bounded above by that attained by ignoring rejections. Moreover, these 
estimates are uniform over all X^: 

-ba d e H <E[A n+11 - A i|X n J <ba d e *> 
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and 



Thus, setting V n = X d x + nba 2 ^ 2 ^ 2 ', {V n ,0 < n < [d]} is submartingale 
with 



(A.20) 



E[V [d] ]<daj + (dbaje b °* 



2 h 2 ^ /2 2 



Since a\ = I 2 /d, the right-hand side of (A.20) is uniformly bounded in d so 
that the upper bound result follows by Doob's inequality. The lower bound 
follows similarly by considering the supermartingale X d l — nba 2 ^^ 2 . □ 



A. 2. Proofs of Section 4. The proofs of Theorems 4.1 and 4.2 are similar 
to the proofs of Theorems 1 and 2 in [7] , respectively. The only complication 
in the proofs is that we are updating a random set of components at each 
iteration in the MALA algorithm. 

Let x°° = (xi, X2, ■ ■ •) and for d > 1, let x d = (xf , x d , xf,), where, for 
1 < i < d, xf = Xi. Thus, we shall again use xf and x\ interchangeably as 
appropriate. Let G d be the (discrete-time) generator of X rf and let V G C, 
be an arbitrary test function of the first component only. Thus, 



oo 
c 



(A.21) 



G d V(x d ) = d 1/3 E {V{Y d ) - V(x d ))^l A ^r£P 



i)E* 



(V(Y d )-V(x d )){lA^ 



where E* is defined after Lemma A.l (cf. Section A.l after Lemma A.l). 

The generator G of the one-dimensional diffusion described in Theorem 
4.2, for an arbitrary test function, V, is given by 

V ~ d3K ^ fl - 9 '(x 1 )V'(x 1 ) + 1 -V"(x 1 ^ 



GV( Xl 



2d 2 <5> 



(A.22) 



2 



\ 1 - g \ Xl )V'{x x ) + \v"{x 1 ) 



where K and h c (l) are defined in Section 5. 

The aim thus, as in Section A.l, is to find a sequence of sets {F d C 
such that, for all t > 0, 

P(rf 6 F d , for all < s < t) -> 1 as d -c oo, 



and, for VeC™, 



sup \G d V{* a 



as d — > oo. 



OPTIMAL SCALING FOR MCMC 



29 



The proofs of Theorem 4.1 and 4.2 are then straightforward. 

The first step is therefore to construct the sets {Fd C However, this is 
much more involved than for the RWM-within-Gibbs in Section A.l. Thus, 
it will be more convenient to construct the sets Fd through the preliminary 
lemmas which lead to the proof of Theorems 4.1 and 4.2. The next step 
will involve a Taylor series expansion of GdV(x. d ) to show that, for large d, 
GV(xi) is a good approximation for GdV(x d ). Thus, we begin by studying 

10 SU d (x") >■ 



Lemma A. 9. There exists a sequence of sets F^i G 
lim^o^d 1 / 3 ^^)} = 0, such that, for X f = 1, 



with 



log 



where 



f f(Yf)q(Yf,xi) \ 
I f(xf)q(xi,Yf) j 

C 3 (xf, Zi)d^ 2 + C 4 (xf, Zjd' 2 / 3 + C 5 (xf, Z t )d' 

+ C 6 (xt,Z l )d' 1 + C 7 (xf,Z i ,a d ), 



C 3 (xf, Zi) = P{-\Z ig '{xi)g"{xi) - ±Z?g"\xf)}, 



5/6 



and where C±{xf,Zi), C^{xf,Zi) and Ce(xf,Zi) are polynomials in Zi and 
the derivatives of g. Furthermore, if E z and E x denote expectation with 
Z ~ N(Q, 1) and X having density /(•), respectively, then 

(A.23) E X E Z [C 3 (X,Z)] = E x E z [C i {X,Z)\ = E X E Z [C 5 (X, Z)] = 0, 

whereas 

(A.24) E X E Z [C 3 (X, Zf\ = $K 2 = -2E X E Z [C & {X, Z)\. 

In addition, 



(A.25) 



sup E* 

x d GF d 



h \f(xt)q(xf,Yf) 
-L-V 2 itx1C 3 (xt,Z z 

{ i=2 



d 6 K 2 



as d 



oo. 



Proof. With the exception of (A.25) and the exact form of the sets 
Fdi, the lemma is proved in [7], Lemma 1. 
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For j = 4, 5, 6 and iGK, set Cj(x) = Ez[Cj(x, Z)] and Vj{x) = vaxz(Cj(x, 
Z)). The set F d)1) j = flLi F d,i,j,k, where 



F d,X,j,\ 
F d,\,j,2 



F d . 



1,3,3 



X 



X 



X 



^{CjixD-ExlC^X)]} 

i=2 



i=2 
d 



<d 5 / 8 
<d 6 / 5 ) 



i=2 



< 



Then for j = 4,5,6 and h = 1,2,3, it is straightforward, using Markov's 
inequality and conditions (4.1) and (4.2), to show that 

d 1/3 TTd(Fd,i,j,k) as d -> °°- 

(Cf. [7], Lemma 1, where only the cases k = l,2 are required.) 

Finally, let {F d \j C M. d } correspond to the sets {-^,7} constructed in [7], 
Lemma 1, and so, d^Tr^F^) — > as d—> 00, where F dt i = f]j=4 F d,i,j- 

The proof of (A. 25) is then essentially the same as the proof of the final 
expression in [7], Lemma 1, and, hence, the details are omitted. □ 

The next step is to find a convenient approximation for G d V(x. d ) which 
effectively allows us to consider separately the first component and the re- 
maining (d — 1) components. 

Lemma A. 10. Let 

G d V{s d ) = c d d lpi E*[V{Y 1 ) - V(xi)]E*[e Bd A 1], 

where B d {= B d (x d )) = E^KsC^) - g{4)) ~ M( x f ~ Y i ~ #sW)) 2 - 

(Y d -xf-^g'(x d )) 2 }. There exists sets F d>2 C R d with lim^^ d l ^ 3 7r d (F^ 2 ) = 
such that, for any V £ , 



sup \G d V(^ d )-G d V{^ a 



as d — > 00. 



Moreover, 



(A.26) sup E* 



Ti d (Y d )q{Y d ,x d ) 
ir d {x d )q(x d ,Y d ) 



A 1 



i Bd A r 



as d — > 00 . 
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PROOF. Since, conditional upon xii ^1 an d Y d ~ are independent, it 
follows that 

G d V{-K d ) = c d d 1 / 3 E*[(V(Y 1 ) - V(xf))(e Bd Al)]. 

The lemma then follows by identical arguments to those used in [7] , Theorem 
3, with the sets {^,2} chosen to correspond to the sets {S n } in [7], Theorem 
3. □ 



Lemma A. 11. Let F dt3 = {x d ;g'(x d ) < d 1 ' 12 } then d 1 ^it d (F^) -» 
d — > 00 and for any V G , 



as 



sup 



d l l*cM*[V(Y d ) - V(x d )] - c^y^V'ixx) + V"( X1 )} 



as d — > 00, 



with x\ = xf . 

Proof. The proof is identical to [7], Lemma 2 and is, hence, omitted. 

□ 



We now focus on the remaining (d— 1) components. First we introduce the 

12^ 



following notation. Let a(x) = — \g'{x)g"{x) and b(x) = — ^g"'{x). There 



fore, we have that 



C 3 (x, z) = l 3 {a(x)z + b(x)z 3 }. 



Set 



Q d ^ d r) = C^J2xfC 3 (x d ,Z l ) 



Let 4> d (x d ,t) = f R exp(itw)Q* d (dw) and let <f>(t) = exp(-yd 6 K 2 ). 

Lemma A. 12. There exists a sequence of sets F d ^ C R rf such that: 

(a) lim^ oo {d 1 /3 7rd ( jF C 4 ) } = 0; 

(b) for all t e R, 

sup \(f> d (x d ; t) — 4>{t)\ — > as d—> 00, 



(c) for all bounded continuous functions r, 

1 



sup 

x d eF d , 4 



Q d (x d ,dy)r(y) 



<2itcPK 



r{y) exp 



d 6 K 2 



dy 



as d — > 00, 
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Proof. The sets F^i are constructed as in the proof of [7], Lemma 3, 
and so, statement (a) follows. Specifically, we let F^a be the set of x rf 6 M. d 
such that 



(A.27) 



-$>(**)- / Hx)f(x)dx 



\h(xf)\<d 3 /\ 



l<i<d. 



(A.28) 

for each of the functional h(x) = a(x) 2 , b(x) 2 , a(x)b(x), a(x) 4 , b(x) 4 , 
a(x) 3 b(x) , a(x) 2 b(x) 2 , a(x)b(x) 3 . 

Since statements (c) and (d) follow from statement (b) as outlined in [7], 
Lemma 3, all that is required is to prove (b). 

Let L d = {j; X ] = 1, 2 < j < d} and let 



9f(xf,t)=E 



it 

\Vd* 



exp( -^C 3 (Xj,Zj] 



Let 



6^( x d ;t) = E* 



exp 



few} 



Then since {^(x'j, Zj)}j =2 are independent random variables, it follows that 

<^(x rf ;i) = Oj{xf,t). 

Therefore, 



E* 



n 



and so, 



(A.29) 



sup 



6 d (x d ;t)-E* 



< sup E* 



n^*)- n 1 



2d 



v{x*) 
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where v{xj) = vur z (C 3 (rf, Z)) = l & {a{xj) 2 + 6a(xj)b(xfi + 156(^) 2 }. 

The right-hand side of (A. 29) converges to as d — > oo by arguments 
similar to those used in [7], Lemma 3. Hence, the details are omitted. 

Now by using a Taylor series expansion for exp(— J2j=2 hiXj v ( x j))i h is 
trivial to show that 

n{i-^>} 



(A.30) 



sup 



E* 



E* 



■3&L d 



d f2 



J=2 







as d — > oo, 



since for all x d £ F dt 4, ^ J2j=2 v ( x j) 2 as (i^ oo (cf. [7], Lemma 3). 
The final step to complete the proof of statement (b) is to show that 



sup 



\--cfK 2 







as d ■ 



oo. 



This follows immediately, since using Chebyshev's inequality, we can show 
that, for all e > 0, 



sup J 



1^ 



> £ 







as d ■ 



oo. 



Thus, statement (b) is proved and the lemma follows. □ 

We are now in position to prove Theorems 4.1 and 4.2. 

Proof of Theorem 4.1. The theorem follows from (A. 25), (A. 26) and 
part (d) of Lemma A. 12. □ 

Proof of Theorem 4.2. We take F d = F dj i n F dj2 n F d - 3 n F dA . Then 
d 1/s ir d (Ff) as d -> oo, 

and so, for fixed T, 

F(TfeF d ,0<t<T)^l asti^oo. 
Also, from Lemmas A.9-A.12, it follows that 

GV(ti)|->0 asd 



sup \G d V{x a 
x d eF d 



oo 



for all V £ C,? , which depend only on the first coordinate. Therefore, the 
weak convergence follows by [4], Chapter 4, Corollary 8.7, since sepa- 
rates points and an identical argument to that of Theorem 3.1 can be used 
to demonstrate compact containment. 

The maximizing of h c (l) is straightforward using the proof of [7], Theorem 
2. □ 
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A. 3. Proofs of Section 5. The proof of Theorem 5.1 is very similar to 
the proof of Theorem 3.1 given in Section A.l. 



1_ spd 



d-2 



and let 



First, for x°° G let x°° = (xf ,xf, . . .), x d 
x = lim^oo x d , should the limit exist. For x°° G IE 

tllQ+ -V-d ( T~°° rpOO r r OO\ \ I „,<i ^(1 

Uldl X — [X l , X 2 ,...,X d ) |— {Xi,X 2 

the first d components of x°°. Then let G d be the (discrete-time) generator 
of X. d , and let V G be an arbitrary test function of x±,X2 and x d only. 
Thus, 



\ let x d G R d be such 
, xf), say], that is, x d comprises 



G d V(x d ) = dE 



(y(Y d )-y(x d ))|iA 



7r d (Y fl 



TT d (x d ) 

The generator G of the three-dimensional diffusion described in Theorem 
5.1, for an arbitrary test function V of x\,X2 and x, is given by 

cl 



Gl/(x c 



— 

2 V V2^/r 



2 

£ 



r i A ft 2 \ 

:{-T^-^< x ~> + a^>} 



We shall define sets {F d C M 00 ; d > 1} such that for dF(X d G F d c ) 
as d — > oo. This is done in Lemma A. 13 and, thus, we can restrict atten- 
tion to x°° G F d . Furthermore, Lemma A. 13 ensures that, for all x°° G F d , 
lim^oo^d exists. Therefore, since we can restrict attention to x°° G F d , we 
aim to show that 



(A.31) 



sup \G d V{x d ) -GV(x c 







as d ■ 



oo, 



which is proved in Theorem A. 17 and then Theorem 5.1 follows trivially. 

Then define sets {F d C W°;d > 1} such that for dF(X d G Ff ) -> as 
<i — ► oo. This is done in Lemma A. 13 and, thus, we can restrict attention to 
x°° G F d . 



Lemma A. 13. For 1 < k < 5, define the sequence of sets {F d ^ C R°°; cZ > 
1} by 

F d , 1 = {x°°;\R d (x d )-(l-p)\<d- 1 / 8 }, 
F d<2 = {x 00 ;\x d -x\<d~ 1 / s }, 



d,3 



x°°; max \x°° \ < 

Ki<d 



"d,4 



X' 



1=1 
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F d ,5 



x 



•i=i V i j=i / 



<d 1 /8 



where R d (x d ) = - l d Tfj=ixf) 2 and 9 d 

Let F d = f|i=i i^,fe, then 
(A.32) dP(X d G Ff ) -» 



l+(rf-2)p-(d-l)p 2 



as d ■ 



CXD. 



Proof. It is sufficient to show that, for 1 < k < 5, 

dP(X d G fg.) ^ as d -► oo. 

For the cases k = 1,3,4 and 5, it is straightforward but tedious using Markov's 
inequality to prove the result. Therefore, the details are omitted. 

For the case k = 2, let X d = ^ Y$=3 X i° (d > 3 ) and let x = lim<f-oc X d . 
Therefore, by construction (see Section 5), for all d> 3, 

/ 1 



(X d 



N 



d-2 



(l + (d-3)p) p 



Thus, 
(A.33) 



{X d \X = x}~N 



(d-2)p p(l-p) 



l + (d-S)p 'l + (d-3)p, 
Therefore, by Markov's inequality, 

(A.34) P(X d G F$ 2 ) =P(|X d - X| > d~ 1/8 ) < \/dE[|X d - X| 4 ], 
and the result follows trivially from (A.33) and (A.34). □ 

The procedure now differs slightly from that given in Section A.l. We 
postpone the finding of a suitable Taylor series expansion for G d V{x. d ) and 
first give Lemmas A. 14, A. 15 and A. 16, which mirror Lemmas A. 4, A. 6 and 
A. 7, respectively. The proofs of the aforementioned lemmas are similar to 
the proofs of the corresponding results in Section A.l and, hence, the details 
are omitted. 

Lemma A. 14. Forl<k<d,let 

x^=x k ^ d )) = ^ 1 J2xt( T L -xf+0 d j2^) ■ 

i^k \ " j=l / 



Then for any e > 0, 



sup r k 



1 



9 



> £ 







as d 



CXD, 



where, for any random variable X and any subset A C 
¥(XeA\ x d k = l) andEt[X]=E[X\x d k = l}. * 



:(X G A) 
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For z€M, let 

h k d (z)(=h k d (z;x d )) = {H 



1 7r d {x d ) 



Z k = z 



The role of h d (z) is similar to that played by hd(z) in Section A.l, with 
h d (0) equivalent to B d (cf. Lemma A. 3). 



Lemma A. 15. For d>l and l<k<d, let 
°dY^i^kXi{^-pXi + 9dY?j=iXj)Zi- Then, 
(A.35) sup |E£[1 A e A *] - E£[l A e h * 



x d )) - 



as d^ oo, 



and 



sup |E£[e^;^<0] - E^e^W; ^(0) <0]|->0 asd^oo. 
x d eF d 



Lemma A. 16. Fork>\, 



(A. 36) sup 

x d eE d 

and 

(A.37) sup 

x d eF d 



E£[l Ae^] -2$ 



2V 1-/0 



E* fc [e^ ; ^< ]-$(-- 



as d — > oo 



as d^ oo. 



2 V 1 ~P 

We are now in position to prove (A. 31). 

Theorem A. 17. 

sup \G d V{x. d ) - GV(x°°)| -^0 asd^oo, 

Proof. Note that, for all d > 3, we have the following Taylor series 
expansion, for V: 

V(Y d ) - V(x d ) 



a d ^xiZi^-V^ d ) + X d 2 Z 2 ^-V(* d ) 



+ X d 2 Z 2 \ 
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d 



dx d 
2 



dx 2 dxd 



V(x d ) 



+ 



d 



1 d \' 

t=3 / 



dx 2 , 



V(x d ) 



+ -^F(x d , X d ,Z) 



= £ A (x d , Z) + ^^F(x rf , Z), say, 

where F(x d ,x d ,Z) is a function of \ d = (Xi>X2> • • • 'Xd)> Z and the third 
derivatives of V(x d ). Since V G C£°, it follows that, for all x d G M d , E[|F(x d , 
X d , Z)|] < oo, and so, 

sup \G d V{x d ) -G d V{x d )\ ->0 asd^oo, 

x d GF d 



where 



G d y(x ci )=^E 



i=l 
9 



A(x d ,/,Z)|1A 



7r(Y d ) 
vr(x d ) 



£(%V(x d ), say. 



Now for all x. d G F d , we have that 



G d V(* d ) = da d E 



X fZ 1 ^-y(x d ){lAexp(^(Z 1 ))} 



^i^VV)j{lAexp(^(0))} 



^ Y3^ Xl + ° d S x i) 



x{exp(^(0));^(0)<0} 
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Therefore, since Z\ and h d (0) are independent, it follows that 



sup 

x d £F d 



G d V(x d ) - dc d a\ | - [jZ- p ^ + °d E 4) } 



(A.38) 



x Ay (x rf )E *[ eX p(/ l i( 0)); / l i(o)<o] 



as d — > oo. 



Now for all x d G F d) 

0d X! x ' = + ^2) - y 



(A.39) 



i=l 



+ ((i _ 2 ) p _( d _l )/9 2j d _ 2 Z. * 



1 



1-p 



-x as d — > oo. 



Therefore, it follows from (A.38), (A.39) and Lemma A. 16 that 

cl 2 



sup 



Similarly, 



sup 



2(1 -p) 



as d — > oo. 



c/ 2 



■I 



2$ 



/ 



2(1 -p) I V 2V1-P 



as 6? — ► oo . 



Next, for all x 61 £ Fj, we have that 



G^(x d ) = da d E 







dc d a d —V(x d ) 
dx d 



d-2 



i=3 



Zi {lAexp(^(0))} 



a d Zi ( —— x{ + d y~] xj ) 
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x{exp(/4(0));/4(0)<0} 

+ 0{da d d 1 / 4 ). 

Therefore, since Z{ and h d (0) are independent, it follows that 



sup \G d V(x. d )-G 3 d V(x. d )\^0 asd^oo, 

x d GF d 



where 



G d V{* d ) = dc d a 2 ^V{* d ) 
dxd 



t=3 \ ' 7=1 / 



xE*[exp(/4(0));/4(0)<0] 



Let 

G d V(x d ) 



cl 2 d 
1-/9 dx d 



d-2 



j2KbMhmy,hm<o](x t -x)\. 



i=3 



Then since, for all x rf G F d , 
have that 



and \ J2j=i x d ^ x, as d^ oo, we 



sup \G d V(x d )-G 3 d V(x d )\^0 asd^oo. 

x d GF d 



By Lemma A. 16, for all x rf G i 7 ^ and i > 3, 



Ei[exp(fci(0));fcS(0)<0]-*(-i/ 



2 V 1-/0 

Therefore, since — > x as d — > oo, we have that 



as d — ► oo. 



Hence, 



sup |G2y(x rf )| ^0 asci^oo. 



sup — > oo. 

x d eF d 



Now for all x d G F d , we have, by independence, that 



GjV( X ) = -da 2 d c d El 



-dajc^l 



^^(x^lAexp^))} 
Z 2 -^V^ d ){lAeMh d (0))} 



3jl/ 



id^c d ^(x d K[l A exp(^(0))] + 0(d<7^ 
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since, for all x d G Fj, Ijz^^i + 6dJ2i=i x t\ — d 1 ^ 8 . Therefore, by Lemma 
A. 16, we have that 



sup 



G d V(x d )- C ^{ 



2$ - 



Similarly, 



G 5 d V(x d ) 



d 

2\" 



2$ 



2 V 1 - p J J dx{ 

i r~c~\\ d 2 
2V T^))d4 



d 2 







V{x°°) 



sup 

There exists et\ lying between and Z\, such that 

|G^(x d )l 
1 



as d — > oo. 



as d — > oo. 



1 



2 

x 



X^ 2 ^i^-F(x d ){{lAex P (4(0))} 



dx\ dx2 



x {exp(^(ai));/ij(ai) < 0} 



Note that Zi is independent of xfj ^2 and h d (0). Therefore, since E|[Zi] = 0, 
we have that 



1 



1-p 



3=1 



Since gjf^^(x d ) is bounded and for all x d G F d , l^zi + ^Ej=i^jl < 
(i 1 / 8 , it follows that the right-hand side of (A.40) converges to as d — > oo. 
Hence, \G e d V(x d )\ -c as d -> oo. 

Similarly, for i = 7,8,9, it can be shown that |G^V(x d )| — ► as d —> oo 
and the lemma follows. □ 



Proof of Theorem 5.1. The proof is similar to that of Theorem 3.1. 
From Lemma A. 13 and Theorem A. 17, we have that dP(X. d £ F d ) — > as 
d — > oo and 

sup |G d y(x d )-Gy(x°°)|^0 asci^oo, 

x°°GE d 
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respectively. Therefore, the weak convergence follows by [4], Chapter 4, 
Corollary 8.7, since separates points and a similar argument to that 
of Theorem 3.1 can be used to demonstrate compact containment. □ 

The proof of Theorem 5.3 is similar to the proof of Theorem 5.1 and, 
hence, the details are omitted. The key point is to show that Lemma A. 13 
still holds with (A. 32) replaced by 

d 2 F(X. d G Ff) as d — ► oo . 

This is again straightforward, but tedious, using Markov's inequality. 
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